We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM's effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM's "generation-simulation-remediation" paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLi5iSoly30. We have open-sourced the toolkit at https://github.com/salesforce/botsim
translated by 谷歌翻译
AI methods are used in societally important settings, ranging from credit to employment to housing, and it is crucial to provide fairness in regard to algorithmic decision making. Moreover, many settings are dynamic, with populations responding to sequential decision policies. We introduce the study of reinforcement learning (RL) with stepwise fairness constraints, requiring group fairness at each time step. Our focus is on tabular episodic RL, and we provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation. Our framework provides useful tools to study the impact of fairness constraints in sequential settings and brings up new challenges in RL.
translated by 谷歌翻译
This paper focuses on the task of survival time analysis for lung cancer. Although much progress has been made in this problem in recent years, the performance of existing methods is still far from satisfactory. Traditional and some deep learning-based survival time analyses for lung cancer are mostly based on textual clinical information such as staging, age, histology, etc. Unlike existing methods that predicting on the single modality, we observe that a human clinician usually takes multimodal data such as text clinical data and visual scans to estimate survival time. Motivated by this, in this work, we contribute a smart cross-modality network for survival analysis network named Lite-ProSENet that simulates a human's manner of decision making. Extensive experiments were conducted using data from 422 NSCLC patients from The Cancer Imaging Archive (TCIA). The results show that our Lite-ProSENet outperforms favorably again all comparison methods and achieves the new state of the art with the 89.3% on concordance. The code will be made publicly available.
translated by 谷歌翻译
A reconstruction attack on a private dataset $D$ takes as input some publicly accessible information about the dataset and produces a list of candidate elements of $D$. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of $D$ from aggregate query statistics $Q(D)\in \mathbb{R}^m$, but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset $D$ was sampled, demonstrating that they are exploiting information in the aggregate statistics $Q(D)$, and not simply the overall structure of the distribution. In other words, the queries $Q(D)$ are permitting reconstruction of elements of this dataset, not the distribution from which $D$ was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.
translated by 谷歌翻译
我们提供了一种差异化私有算法,用于同时生成多个任务的合成数据:边际查询和多任务机器学习(ML)。我们算法中的一个关键创新是能够直接处理数值特征的能力,与许多相关的先验方法相反,这些方法需要首先通过{binning策略}将数值特征转换为{高基数}分类特征。为了提高准确性,需要较高的分子粒度,但这会对可伸缩性产生负面影响。消除对套在一起的需求使我们能够产生合成数据,以保留大量统计查询,例如数值特征的边际和条件线性阈值查询。保留后者意味着在特定半空间上方的每个类标记的点的比例在实际数据和合成数据中都大致相同。这是在多任务设置中训练线性分类器所需的属性。我们的算法还使我们能够为混合边缘查询提供高质量的合成数据,这些数据结合了分类和数值特征。我们的方法始终比最佳可比技术快2-5倍,并在边缘查询和混合型数据集的线性预测任务方面提供了显着的准确性改进。
translated by 谷歌翻译
计量经济学和机器学习中的各种问题,包括仪器变量回归和钟声残留最小化,可以表达为满足一组条件矩限制(CMR)。我们得出了满足CMR的一般游戏理论策略,该策略可扩展到非线性问题,可与基于梯度的优化相提并论,并且能够考虑有限的样本不确定性。我们恢复了Dikkala等人的方法。和Dai等。作为我们一般框架的特殊情况,请先详细介绍各种扩展,以及如何有效地解决CMR定义的游戏。
translated by 谷歌翻译
我们考虑模仿学习问题,在这些问题中,专家可以在演示时间和测试时间内访问学习者隐藏的每个集合上下文。尽管学习者可能无法通过考虑整个国家和行动的历史来早期在情节中准确地重现专家行为,但他们可能最终能够识别上下文并像专家一样行事。我们证明,与非政策的方法相比,在政策模仿学习算法(有或不访问可查询的专家)都可以更好地处理这些渐近性问题,并且能够避免闩锁行为(对过去的动作的天真重复)这困扰着后者。我们在玩具匪徒域中进行实验,该实验表明,与统一的policy方法的均匀性能相比,非政策方法是否能够渐近地匹配专家的性能。我们证明,在几个连续的控制任务上,政策方法能够使用历史记录来识别上下文,而在访问历史记录时,违反政策方法实际上表现较差。
translated by 谷歌翻译
近年来,人类面孔的影子化化身已经走了很长一段路,但是该地区的研究受到缺乏公开可用的高质量数据集的限制。在这项工作中,我们介绍了Multiface,这是一种新的多视图,高分辨率的人脸数据集,该数据集是从13个身份的神经面部渲染研究中收集的13个身份。我们介绍了Mugsy,这是一种大型多摄像机设备,可捕获面部表现的高分辨率同步视频。 Multiface的目的是缩小学术界高质量数据的可访问性的差距,并使VR触觉研究能够进行研究。随着数据集的释放,我们对不同模型体系结构对模型的新观点和表达式的插值能力进行消融研究。通过有条件的VAE模型作为我们的基线,我们发现添加空间偏见,纹理翘曲场和残差连接可改善新型视图合成的性能。我们的代码和数据可在以下网址获得:https://github.com/facebookresearch/multiface
translated by 谷歌翻译
虽然差异隐私的应用(DP)在联合学习(FL)方面进行了充分研究,但考虑到跨索洛FL的DP缺乏工作,该设置的特征是有限数量的客户,每个客户都包含许多人数据主体。在跨索洛fl中,由于现实世界中的隐私法规,通常涉及核心数据主体,而不是孤岛本身,因此客户级隐私的通常概念不太适合。在这项工作中,我们相反,考虑了更现实的孤岛特定项目级隐私的概念,其中筒仓为当地示例设定了自己的隐私目标。在这种情况下,我们重新考虑了个性化在联合学习中的作用。特别是,我们表明,均值进行的多任务学习(MR-MTL)是一个简单的个性化框架,是跨索洛FL的强大基准:在更强的隐私下,孤岛进一步激励彼此“联合”以互相“联合”减轻DP噪声,相对于标准基线方法,导致一致的改进。我们为竞争方法以及MR-MTL的理论表征提供了一项彻底的经验研究,以实现平均估计问题,从而突出了隐私与跨核数据异质性之间的相互作用。我们的工作旨在为私人跨索洛FL建立基准,并确定该领域未来工作的关键方向。
translated by 谷歌翻译
研究人员和从业人员如何处理隐私 - 实用性权衡之间存在脱节。研究人员主要是从隐私的第一角度运作,设定严格的隐私要求并最大程度地限制受这些约束的风险。从业者通常希望获得准确的第一视角,可能会对他们可能获得足够小的错误的最大隐私感到满意。 Ligett等。已经引入了一种“降噪”算法来解决后一种观点。作者表明,通过添加相关的拉普拉斯噪声并逐步减少其需求,可以产生一系列越来越准确的私人参数估计值,而仅以最低噪声介绍的方式支付隐私成本。在这项工作中,我们将降噪概括为高斯噪声的设置,并引入了布朗机制。布朗机制首先添加与模拟布朗运动的最后点相对应的高方差的高斯噪声。然后,根据从业人员的酌情决定权,通过沿着布朗的路径追溯到较早的时间来逐渐降低噪音。我们的机制更自然地适用于有限的$ \ ell_2 $ - 敏感性的共同设置,从经验上优于公共统计任务上的现有工作,并在与从业者的整个交互中提供了对隐私损失的可自定义控制。我们通过简化的Brownian机制来补充我们的布朗机制,这是对提供自适应隐私保证的经典座位算法的概括。总体而言,我们的结果表明,人们可以达到公用事业的限制,同时仍保持强大的隐私水平。
translated by 谷歌翻译